In this tutorial, we will apply CrowdTruth metrics to a free input crowdsourcing task for Person Annotation from video fragments. The workers were asked to watch a video of about 3-5 seconds and then add tags that are relevant for the people that appear in the video fragment. The task was executed on FigureEight. For more crowdsourcing annotation task examples, click here.
To replicate this experiment, the code used to design and implement this crowdsourcing annotation template is available here: template, css, javascript.
This is a screenshot of the task as it appeared to workers:
A sample dataset for this task is available in this file, containing raw output from the crowd on FigureEight. Download the file and place it in a folder named data
that has the same root as this notebook. Now you can check your data:
In [1]:
import pandas as pd
test_data = pd.read_csv("../data/person-video-free-input.csv")
test_data.head()
Out[1]:
In [2]:
import crowdtruth
from crowdtruth.configuration import DefaultConfig
Our test class inherits the default configuration DefaultConfig
, while also declaring some additional attributes that are specific to the Person Type/Role Annotation in Video task:
inputColumns
: list of input columns from the .csv file with the input dataoutputColumns
: list of output columns from the .csv file with the answers from the workersannotation_separator
: string that separates between the crowd annotations in outputColumns
open_ended_task
: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input); in the task that we are processing, workers pick the answers from a pre-defined list, therefore the task is not open ended, and this variable is set to False
annotation_vector
: list of possible crowd answers, mandatory to declare when open_ended_task
is False
; for our task, this is the list of relationsprocessJudgments
: method that defines processing of the raw crowd data; for this task, we process the crowd answers to correspond to the values in annotation_vector
Same examples of possible processing functions of crowd answers are given below:
In [3]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from autocorrect import spell
def correct_words(keywords, separator):
keywords_list = keywords.split(separator)
corrected_keywords = []
for keyword in keywords_list:
words_in_keyword = keyword.split(" ")
corrected_keyword = []
for word in words_in_keyword:
correct_word = spell(word)
corrected_keyword.append(correct_word)
corrected_keywords.append(" ".join(corrected_keyword))
return separator.join(corrected_keywords)
def cleanup_keywords(keywords, separator):
keywords_list = keywords.split(separator)
stopset = set(stopwords.words('english'))
filtered_keywords = []
for keyword in keywords_list:
tokens = nltk.word_tokenize(keyword)
cleanup = " ".join(filter(lambda word: str(word) not in stopset or str(word) == "no" or str(word) == "not", keyword.split()))
filtered_keywords.append(cleanup)
return separator.join(filtered_keywords)
def nltk2wn_tag(nltk_tag):
if nltk_tag.startswith('J'):
return wordnet.ADJ
elif nltk_tag.startswith('V'):
return wordnet.VERB
elif nltk_tag.startswith('N'):
return wordnet.NOUN
elif nltk_tag.startswith('R'):
return wordnet.ADV
else:
return None
def lemmatize_keywords(keywords, separator):
keywords_list = keywords.split(separator)
lematized_keywords = []
for keyword in keywords_list:
nltk_tagged = nltk.pos_tag(nltk.word_tokenize(str(keyword)))
wn_tagged = map(lambda x: (str(x[0]), nltk2wn_tag(x[1])), nltk_tagged)
res_words = []
for word, tag in wn_tagged:
if tag is None:
res_word = wordnet._morphy(str(word), wordnet.NOUN)
if res_word == []:
res_words.append(str(word))
else:
if len(res_word) == 1:
res_words.append(str(res_word[0]))
else:
res_words.append(str(res_word[1]))
else:
res_word = wordnet._morphy(str(word), tag)
if res_word == []:
res_words.append(str(word))
else:
if len(res_word) == 1:
res_words.append(str(res_word[0]))
else:
res_words.append(str(res_word[1]))
lematized_keyword = " ".join(res_words)
lematized_keywords.append(lematized_keyword)
return separator.join(lematized_keywords)
The complete configuration class is declared below:
In [4]:
class TestConfig(DefaultConfig):
inputColumns = ["videolocation", "subtitles", "imagetags", "subtitletags"]
outputColumns = ["keywords"]
# processing of a closed task
open_ended_task = True
annotation_vector = []
def processJudgments(self, judgments):
# pre-process output to match the values in annotation_vector
for col in self.outputColumns:
# transform to lowercase
judgments[col] = judgments[col].apply(lambda x: str(x).lower())
# remove square brackets from annotations
judgments[col] = judgments[col].apply(lambda x: str(x).replace('[]','no tags'))
judgments[col] = judgments[col].apply(lambda x: str(x).replace('[',''))
judgments[col] = judgments[col].apply(lambda x: str(x).replace(']',''))
# remove the quotes around the annotations
judgments[col] = judgments[col].apply(lambda x: str(x).replace('"',''))
# apply custom processing functions
judgments[col] = judgments[col].apply(lambda x: correct_words(str(x), self.annotation_separator))
judgments[col] = judgments[col].apply(lambda x: "no tag" if cleanup_keywords(str(x), self.annotation_separator) == '' else cleanup_keywords(str(x), self.annotation_separator))
judgments[col] = judgments[col].apply(lambda x: lemmatize_keywords(str(x), self.annotation_separator))
return judgments
In [5]:
data, config = crowdtruth.load(
file = "../data/person-video-free-input.csv",
config = TestConfig()
)
data['judgments'].head()
Out[5]:
In [6]:
results = crowdtruth.run(data, config)
results
is a dict object that contains the quality metrics for the video fragments, annotations and crowd workers.
The video fragment metrics are stored in results["units"]
:
In [7]:
results["units"].head()
Out[7]:
The uqs
column in results["units"]
contains the video fragment quality scores, capturing the overall workers agreement over each video fragment. Here we plot its histogram:
In [8]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(results["units"]["uqs"])
plt.xlabel("Video Fragment Quality Score")
plt.ylabel("Video Fragment")
Out[8]:
The unit_annotation_score
column in results["units"]
contains the video fragment-annotation scores, capturing the likelihood that an annotation is expressed in a video fragment. For each video fragment, we store a dictionary mapping each annotation to its video fragment-relation score.
In [9]:
results["units"]["unit_annotation_score"].head()
Out[9]:
The worker metrics are stored in results["workers"]
:
In [10]:
results["workers"].head()
Out[10]:
The wqs
columns in results["workers"]
contains the worker quality scores, capturing the overall agreement between one worker and all the other workers.
In [11]:
plt.hist(results["workers"]["wqs"])
plt.xlabel("Worker Quality Score")
plt.ylabel("Workers")
Out[11]: